General Information


This post is intended to act as a brief Situation Report (SITREP) about the genomic sequencing of the B.1.1.7 lineage of SARS-CoV-2 in USA.

As of 01/14/2021, there's been 14627 sequences detected in over 38 countries under this strain (see table for definition). In the US, there's been 73 sequences detected in over 8 states.

This report shows the prevalence at the state, national, and global level. The table on the left contains the key mutations that represent the strain. The plot on the right shows the genetic distance of B.1.1.7 genomes and the immediate outgroup of this lineage over time. This is also referred to as the root to tip plot.

Table 1.1: Key Mutations that define the strain
Gene Nucleotide Mutations Amino Acid Changes
ORF1ab C3266T, T6953C, C5387A, 11288:11296 deletion T1001I, I2230T, A1708D
S A23062T, C23270A, A23402G, C23603A, C23708T, T24505G, G24913C DEL69-70, DEL144Y, N501Y, A570D, D614G, P681H, T716I, S982A, D1118H
N GAT28279CTA, C28976T D3L, S235F
ORF8 G28047T, C27971T, A28110G R52I, Q27_, Y73C
Figure 1.1: shows the genetic distance (root-to-tip), a measure of evolutionary changes for strain and non-strain (related) samples (excluding other well known VOCs e.g. B.1.135).

State Prevalence


Figure 2.1 shows the spatial (geographical) prevalence of the strain across California.
Figure 2.2 shows the temporal (over time) prevalence of the strain across California.

National Prevalence


Figure 3.1 shows the spatial (geographical) prevalence of the strain across the US.
Figure 3.2 shows the temporal (over time) prevalence of the strain across the US.

Global Prevalence

More detailed information on global prevalence of SARS-CoV-2 strains of concern can be found in this post.

Figure 4.1 shows the spatial (geographical) prevalence of the strain across the world.
Figure 4.2 shows the temporal (over time) prevalence of the strain across the world.

Notes on Sampling


As figure 3.2 indicates, the B.1.1.7 genomes in the US (so far), were not a result of unbiased sequencing but were identified by S-gene target failures (SGTF) in community-based diagnostic PCR testing. Since it was not an unbiased approach, it does not indicate the true prevalence of the B117 lineage in the US. This only tells us that the lineage is present in the US.
P.S: estimates of true prevalence in the US are discussed in this post

The figure above is a simple illustration of how genomic surveillance of COVID-19 samples could allow us to elucidate an increasingly clear picture of how the virus is evolving and spreading. The pictures above are electromagnetic microscopy images of SARS-CoV-2 that are " crappified" (salt & pepper noise) to varying degrees depending on the rate of COVID-19 sequencing at each location. As a reference, we include a clear picture on the right to indicate that a 5% genomic sampling rate would be an ideal (first) objective to be able to observe statistically significant phenomena.

Comments


Research laboratories across the US are encouraged to contribute to COVID-19 genomic sequencing efforts. More detailed information can be found here.

Specifically when uploading genomes from the B.1.1.7 lineage to GISAID or GenBank, please indicate if the sample was identified via S-gene target failures (SGTF). This can be indicated under the fields "purpose_of_sequencing" (GISAID) or "Additionl host information". This will help in identifying the true prevalence of the lineage across the country.